A Family of Latent Variable Convex Relaxations for IBM Model 2

نویسندگان

Andrei Simion

Michael Collins

Clifford Stein

چکیده

Recently, a new convex formulation of IBM Model 2 was introduced. In this paper we develop the theory further and introduce a class of convex relaxations for latent variable models which include IBM Model 2. When applied to IBM Model 2, our relaxation class subsumes the previous relaxation as a special case. As proof of concept, we study a new relaxation of IBM Model 2 which is simpler than the previous algorithm: the new relaxation relies on the use of nothing more than a multinomial EM algorithm, does not require the tuning of a learning rate, and has some favorable comparisons to IBM Model 2 in terms of F-Measure. The ideas presented could be applied to a wide range of NLP and machine learning problems. Introduction The IBM translation models (Brown et al. 1993) were the first Statistical Machine Translation (SMT) systems; their primary use in the current SMT pipeline is to seed more sophisticated models which need alignment tableaus to start their optimization procedure. Although there are several IBM Models, only IBM Model 1 can be formulated as a convex optimization problem. Other IBM Models have non-concave objective functions with multiple local optima, and solving a non-convex problem to optimality is typically a computationally intractable task. Recently, using a linearization technique, a convex relaxation of IBM Model 2 was proposed (Simion, Collins, and Stein 2013; 2014). In this work we generalize the methods introduced in (Simion, Collins, and Stein 2013) to yield a richer set of relaxation techniques. Our algorithms have comparable performance to previous work and have the potential for more applications. We make the following contributions in this paper: • We introduce a convexification method that may be applicable to a wide range of probabilistic models in NLP and machine learning. In particular, since the likelihood we are optimizing and the metric we are testing against are often not the same (e.g. for alignment tasks we want to maximize F-Measure, but F-Measure is not directly in the likelihood function), different relaxations should potentially be considered for different tasks. The crux of Copyright c © 2015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. our approach relies on approximating the product function ∏n i=1 xi with a concave function and as a supplement we present some theoretical analysis characterizing concave functions h that approximate this function. • As a specific application, we introduce a generalized family of convex relaxations for IBM Model 2.1 Essentially, the relaxation is derived by replacing the product t(fj |ei)× d(i|j) with h(t(fj |ei), d(i|j)) where h(x1, x2) is a concave upper envelope for x1x2. We show how our results encompass the work of (Simion, Collins, and Stein 2013) as a special case. • We detail an optimization algorithm for a particularly simple relaxation of IBM Model 2. Unlike the previous work in (Simion, Collins, and Stein 2013) which relied on a exponentiated subgradient (EG) optimization method and required the tuning of a learning rate, this relaxation can be approached in a much simpler fashion and can be optimized by an EM algorithm that is very similar to the one used for IBM Models 1 and 2. We show that our method achieves a performance very similar to that of IBM Model 2 seeded with IBM 1. Notation. Throughout this paper, for any positive integer N , we use [N ] to denote {1 . . . N} and [N ]0 to denote {0 . . . N}. We denote by R+ and R++ the set of nonnegative and strictly positive n dimensional vectors, respectively. We denote by [0, 1] the n−dimensional unit cube.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tighter Lifting-Free Convex Relaxations for Quadratic Matching Problems

In this work we study convex relaxations of quadratic optimisation problems over permutation matrices. While existing semidefinite programming approaches can achieve remarkably tight relaxations, they have the strong disadvantage that they lift the original n×n-dimensional variable to an n2×n2-dimensional variable, which limits their practical applicability. In contrast, here we present a lifti...

متن کامل

Convex Relaxations of Latent Variable Training

We investigate a new, convex relaxation of an expectation-maximization (EM) variant that approximates a standard objective while eliminating local minima. First, a cautionary result is presented, showing that any convex relaxation of EM over hidden variables must give trivial results if any dependence on the missing values is retained. Although this appears to be a strong negative outcome, we t...

متن کامل

Nonconvex Global Optimization for Latent-Variable Models

Many models in NLP involve latent variables, such as unknown parses, tags, or alignments. Finding the optimal model parameters is then usually a difficult nonconvex optimization problem. The usual practice is to settle for local optimization methods such as EM or gradient ascent. We explore how one might instead search for a global optimum in parameter space, using branch-and-bound. Our method ...

متن کامل

Learning with Relaxed Supervision

For weakly-supervised problems with deterministic constraints between the latent variables and observed output, learning necessitates performing inference over latent variables conditioned on the output, which can be intractable no matter how simple the model family is. Even finding a single latent variable setting that satisfies the constraints could be difficult; for instance, the observed ou...

متن کامل

Using multivariate generalized linear latent variable models to measure the difference in event count for stranded marine animals

BACKGROUND AND OBJECTIVES: The classification of marine animals as protected species makes data and information on them to be very important. Therefore, this led to the need to retrieve and understand the data on the event counts for stranded marine animals based on location emergence, number of individuals, behavior, and threats to their presence. Whales are g...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

A Family of Latent Variable Convex Relaxations for IBM Model 2

نویسندگان

چکیده

منابع مشابه

Tighter Lifting-Free Convex Relaxations for Quadratic Matching Problems

Convex Relaxations of Latent Variable Training

Nonconvex Global Optimization for Latent-Variable Models

Learning with Relaxed Supervision

Using multivariate generalized linear latent variable models to measure the difference in event count for stranded marine animals

عنوان ژورنال:

اشتراک گذاری